Partitioning Register File to Reduce Access Time

نویسندگان

Hyong-Youb Kim

Julie Rosser

Kyle Bryson

Supratik Majumder

چکیده

In a wide superscalar processor, the amount of time it takes to execute an application depends on the instruction latency and the amount of instruction level parallelism (ILP) that can be extracted from the application. One important factor which influences the instruction latency is the number of cycles it takes to access the register file. Whether it is a CISC architecture or a RISC architecture, practically every instruction accesses the register file for one or more operands. Therefore, its hardly surprising that processor architects have over the years designed architectures which enabled a one cycle read/write access to the register file. Single-cycle accesses are also preferred because larger access times require deeper pipelines, and deeper pipelines induce more hazards, larger branch penalties, and more complex hardware for hazard detection and data forwarding. Studies have shown that many of the ILP increasing techniques employed in wide-issue superscalar processors increase the demand for registers [3]. At the same time, an increased issue width of the processor requires additional register ports. Typically, a 4-wide superscalar processor needs to have at least 8 read ports and 4 write ports on its register file. Both of these affect the register file not only by making it much bigger but also much slower. The access time of a register file consists of two distinct components: the wire propagation delay and the fan-in/fan-out delay. Register files typically contain long word-lines and bit-lines, which can take a long time to propagate a signal across their length. For the kind of register file structures considered here, the wire propagation delay is far greater than the fan-in/fan-out delay. Bigger register file and an increased number of ports result in a taller register file layout, which translates to longer word-lines and bit-lines [7], thereby increasing wire propagation delay. Also, wire delays do not at all scale with the silicon technology improvements. Thus as register files grow in size, with faster transistors (smaller feature sizes), its only exacerbates their delay problem. Over the past decade, researchers have suggested a number of techniques for alleviating the problem of increased wire delay. Whenever a large block of silicon takes up a large fraction of the cycle time, it usually common to split the block up into smaller and more importantly faster pieces [6]. In the past, precious silicon area dictated logic reuse, but these days designers frequently duplicate logic to reduce wire lengths. We believe that these couple of ideas could be applied to the register files as well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Partitioned Register File Designs for Clustered Architectures

The clustered architecture, where the conventional monolithic register file is partitioned into several smaller register files, is one of the candidates for the future high performance processor architectures. The aggressive partitioning can reduce the access time of the register file. On the other hand, the partitioning makes losses of instructions per clock cycle due to communication among re...

متن کامل

Reducing Register Pressure Through LAER Algorithm

When modern processors keep increasing the instruction window size and the issue width to exploit more instruction-level parallelism (ILP), the demand of larger physical register file is also on the increase. As a result, register file access time represents one of the critical delays and can easily become a bottleneck. In this paper, we first discuss the possibilities of reducing register pres...

متن کامل

Improving Register File Banking with a Power-Aware Unroller

The complexity of the register file determines the cycle time of high performance wide-issue microprocessors due to the access time and size of this structure. Both parameters are directly related to the number of read and write ports of the register file. Therefore, it is a priority goal to reduce this complexity in order to allow the efficient implementation of complex superscalar machines. T...

متن کامل

Reducing Operand Transport Complexity of Superscalar Processors using Distributed Register Files

A critical problem in wide-issue superscalar processors is the limit on cycle time imposed by the central register file and operand bypass network. In this paper, a distributed register file architecture that employs fully distributed functional unit clusters is presented. It utilizes a local register mapping table and a dedicated register transfer network to support distributed register operat...

متن کامل

Compiler-assisted power optimization for clustered VLIW architectures

Clustered VLIW architectures solve the scalability problem associated with flat VLIW architectures by partitioning the register file and connecting only a subset of the functional units to a register file. However, inter-cluster communication in clustered architectures leads to increased leakage in functional components and a high number of register accesses. In this paper, we propose compiler ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Partitioning Register File to Reduce Access Time

نویسندگان

چکیده

منابع مشابه

Partitioned Register File Designs for Clustered Architectures

Reducing Register Pressure Through LAER Algorithm

Improving Register File Banking with a Power-Aware Unroller

Reducing Operand Transport Complexity of Superscalar Processors using Distributed Register Files

Compiler-assisted power optimization for clustered VLIW architectures

عنوان ژورنال:

اشتراک گذاری